From discontinuous to continuous F0 modelling in HMM-based speech synthesis
نویسندگان
چکیده
The accurate modelling of fundamental frequency, or F0, in HMM-based speech synthesis is a critical factor in achieving high quality speech. However, it is also difficult because F0 values are normally considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. A widely used solution is to use a multi-space probability distribution HMM (MSDHMM), which directly models discontinuous F0 observations. An alternative solution, continuous F0 modelling, has been recently proposed and shown to be more effective in achieving natural synthesised speech. Here, continuous F0 observations are assumed to always exist and hence they can be modelled by standard HMMs. This paper describes a general mathematical framework for discontinuous F0 modelling, of which MSDHMM is a special case, and compares it to continuous F0 modelling. Various aspects associated with continuous F0 modelling, the use of a single F0 stream, globally tied distributions (GTD) and the assumption of a continuous unvoiced F0, are discussed in theory and examined in experiments. Both objective measures and subjective listening tests demonstrate that the introduction of continuous unvoiced F0 is vital for achieving speech quality improvement.
منابع مشابه
Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation
This paper examines F0 modeling and generation techniques for spontaneous speech synthesis. In the previous study, we proposed a prosodic-unit HMM where the synthesis unit is defined as a segment between two prosodic events represented by a ToBI label framework. To take the advantage of the prosodicunit HMM, continuous F0 sequences must be modeled from discontinuous F0 data including unvoiced r...
متن کاملUsing Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis
In parametric text-to-speech synthesis using Hidden Markov Model (HMM), the fundamental frequency (F0) parameter modelling is important because it has a direct effect on the prosody of synthetic speech. F0 is typically modelled by a discrete distribution for unvoiced speech and a continuous distribution for voiced, by using a multi-space distribution (MSD). However, F0 modelling using MSD-HMM i...
متن کاملObjective evaluation of HMM-based speech synthesis system using kullback-leibler divergence
In this paper, we propose a new objective evaluation method for hidden Markov model (HMM)-based speech synthesis using Kullback-Leibler divergence (KLD). The KLD is used to measure the difference between the probability density functions (PDFs) of the acoustic feature vectors extracted from natural training and synthetic speech data. For the evaluation, Gaussian mixture model (GMM) is used to m...
متن کاملCorpus-Based Hidden Markov Modelling of the Fundamental Frequency of Lithuanian
This paper presents the corpus-driven approach in building the computational model of fundamental frequency, or F0, for Lithuanian language. The model was obtained by training the HMM-based speech synthesis system HTS on six hours of speech coming from multiple speakers. Several gender specific models, using different parameters and different contextual factors, were investigated. The models we...
متن کاملImproved generation of prosodic features in HMM-based Mandarin speech synthesis
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010